One Sense per Discourse 2. Our Previous Work on Word-sense Disambiguation 2.1. Data Deprivation
نویسندگان
چکیده
It is well-known that there are polysemous words like sentence whose "meaning" or "sense" depends on the context of use. We have recently reported on two new word-sense disambiguation systems, one trained on bilingual material (the Canadian Hansards) and the other trained on monolingual material (Roget's Thesaurus and Grolier's Encyclopedia). As this work was nearing completion, we observed a very strong discourse effect. Tha t is, if a polysemous word such as sentence appears two or more times in a well-written discourse, it is extremely likely that they will all share the same sense. This paper describes an experiment which confirmed this hypothesis and found that the tendency to share sense in the same discourse is extremely strong (98%). This result can be used as an additional source of constraint for improving the performance of the word-sense disambiguation algorithm. In addition, it could also be used to help evaluate disambiguation algorithms that did not make use of the discourse constraint. 2. O U R P R E V I O U S W O R K O N W O R D S E N S E D I S A M B I G U A T I O N 2.1. D a t a Deprivation Although there has been a long history of work on word-sense disambiguation, much of the work has been stymied by difficulties in acquiring appropriate testing and training materials. AI approaches have tended to focus on "toy" domains because of the difficulty in acquiring large lexicons. So too, statistical approaches, e.g., Kelly and Stone (1975), Black (1988), have tended to focus on a relatively small set of polysemous words because they have depended on extremely scarce handtagged materials for use in testing and training. We have achieved considerable progress recently by using a new source of testing and training materials and the application of Bayesian discrimination methods. Rather than depending on small amounts of hand-tagged text, we have been making use of relatively large amounts of parallel text, text such as the Canadian Hansards, which are available in multiple languages. The translation can often be used in lieu of hand-labeling. For example, consider the polysemous word sentence, which has two major senses: (1) a judicial sentence, and (2), a syntactic sentence. We can collect a number of sense (1) examples by extracting instances that are translated as peine, and we can collect a number of sense (2) examples by extracting instances that are translated as phrase. In this way, we have been able to acquire a considerable amount of testing and training material for developing and testing our disambiguation algorithms. The use of bilingual materials for discrimination decisions in machine tranlation has been discussed by Brown and others (1991), and by Dagan, Itai, and Schwall (1991). The use of bilingual materials for an essentially monolingual purpose, sense disambiguation, is similar in method, but differs in purpose. 2.2. Bayesian Discrimination Surprisingly good results can be achieved using Bayesian discrimination methods which have been used very successfully in many other applications, especially author identification (Mosteller and Wallace, 1964) and information retrieval (IR) (Salton, 1989, section 10.3). Our word-sense disambiguation algorithm uses the words in a 100-word context 1 surrounding the polysemous word very much like the other two applications use the words in a test document. Information Retreival (IR): Pr(wlret) I1 'r(wlirrd) w i n doe l i t is c o m m o n to use very sma l l c o n t e x t s (e.g., 5-words) b a s e d o n t h e obse rva t ion t h a t people do no t need very m u c h con tex t in o rder to pe r fo rmance the d i s a m b i g u a t i o n task . In con t ra s t , we use m u c h larger con t ex t s (e.g., 100 words) . A l t h o u g h people m a y be able to m a k e do wi th m u c h less con tex t , we bel ieve t h e m a c h i n e needs all t he he lp it can get , a n d we have found t h a t t h e larger con tex t m a k e s the t a sk m u c h easier . In fact , we have b een able t o m e a s u r e i n f o r m a t i o n a t ex t r eme ly large d i s t ance s (10,000 words away f rom the p o l y s e m o u s word in ques t ion) , t h o u g h obvious ly m o s t of t he useful i n f o r m a t i o n a p p e a r s re la t ively n ea r the polys e m o u s word (e.g., wi th in t he first 100 words or so). Needless to say, our 100-word c o n t e x t s are cons ide rab ly la rger t h a n t h e smal le r 5-word windows t h a t one n o r m a l l y f inds in t he l i t e ra ture .
منابع مشابه
Improving Word Sense Disambiguation in Lexical Chaining
Previous algorithms to compute lexical chains suffer either from a lack of accuracy in word sense disambiguation (WSD) or from computational inefficiency. In this paper, we present a new lineartime algorithm for lexical chaining that adopts the assumption of one sense per discourse. Our results show an improvement over previous algorithms when evaluated on a WSD task.
متن کاملOne Sense Per Discourse
It is well-known that there are polysemous words like sentence whose ‘‘meaning’’ or ‘‘sense’’ depends on the context of use. We have recently reported on two new word-sense disambiguation systems, one trained on bilingual material (the Canadian Hansards) and the other trained on monolingual material (Roget’s Thesaurus and Grolier’s Encyclopedia). As this work was nearing completion, we observed...
متن کاملImproving Japanese Zero Pronoun Resolution by Global Word Sense Disambiguation
This paper proposes unsupervised word sense disambiguation based on automatically constructed case frames and its incorporation into our zero pronoun resolution system. The word sense disambiguation is applied to verbs and nouns. We consider that case frames define verb senses and semantic features in a thesaurus define noun senses, respectively, and perform sense disambiguation by selecting th...
متن کامل"One Entity per Discourse" and "One Entity per Collocation" Improve Named-Entity Disambiguation
The “one sense per discourse” (OSPD) and “one sense per collocation” (OSPC) hypotheses have been very influential in Word Sense Disambiguation. The goal of this paper is twofold: (i) to explore whether these hypotheses hold for entities, that is, whether several mentions in the same discourse (or the same collocation) tend to refer to the same entity or not, and (ii) test their impact in Named-...
متن کاملUtilizing the One-Sense-per-Discourse Constraint for Fully Unsupervised Word Sense Induction and Disambiguation
Recent advances in word sense induction rely on clustering related words. In this paper, instead of using a clustering algorithm, we suggest to perform a Singular Value Decomposition (SVD) which can be guaranteed to always find a global optimum. However, in order to apply this method to the problem of word sense induction, a semantic interpretation of the dimensions computed by the SVD is requi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1992